Holistic Query Evaluation over Information Extraction Pipelines
نویسندگان
چکیده
We introduce holistic in-database query processing over information extraction pipelines. This requires considering the joint conditional distribution over generic Conditional Random Fields that uses factor graphs to encode extraction tasks. Our approach introduces Canopy Factor Graphs, a novel probabilistic model for effectively capturing the joint conditional distribution given a canopy clustering of the data, and special query operators for retrieving resolution information. Since inference on such models is intractable, we introduce an approximate technique for query processing and optimizations that cut across the integrated tasks for reducing the required processing time. Effectiveness and scalability are verified through an extensive experimental evaluation using real and synthetic data. PVLDB Reference Format: Ekaterini Ioannou, and Minos Garofalakis. Holistic Query Evaluation over Information Extraction Pipelines. PVLDB, 11(2): 217 229, 2017. DOI: 10.14778/3149193.3149201
منابع مشابه
CUNY-BLENDER TAC-KBP2010 Entity Linking and Slot Filling System Description
The CUNY-BLENDER team participated in the following tasks in TAC-KBP2010: Regular Entity Linking, Regular Slot Filling and Surprise Slot Filling task (per:disease slot). In the TAC-KBP program, the entity linking task is considered as independent from or a pre-processing step of the slot filling task. Previous efforts on this task mainly focus on utilizing the entity surface information and the...
متن کاملTowards Holistic Web-Based Information Retrieval: An Agent-Based Approach
This paper presents an agent-based system for bolstering holistic information retrieval via the WWW. In Ellis’ holistic model of information seeking behaviors, the information seeking activities include: selection of sources, browsing and differentiating, monitoring as well as extraction. Through the use of a query processing agent (QPA), information filtering agents (IFAs) and information moni...
متن کاملExperiences using F# for developing analysis scripts and tools over search engine query log data
We describe our experience using the programming language F# for analysis of text query logs from the Bing search engine. The goals of the project were to develop a set of scripts for enabling ad-hoc query analysis, clustering and feature extraction as well as to provide a subset of these within a data exploration tool developed for non-programmers. Where appropriate we describe programming pat...
متن کاملTop-Down and Bottom-Up: A Combined Approach to Slot Filling
The Slot Filling task requires a system to automatically distill information from a large document collection and return answers for a query entity with specified attributes (‘slots’), and use them to expand the Wikipedia infoboxes. We describe two bottom-up Information Extraction style pipelines and a top-down Question Answering style pipeline to address this task. We propose several novel app...
متن کاملAn Effective Path-aware Approach for Keyword Search over Data Graphs
Abstract—Keyword Search is known as a user-friendly alternative for structured languages to retrieve information from graph-structured data. Efficient retrieving of relevant answers to a keyword query and effective ranking of these answers according to their relevance are two main challenges in the keyword search over graph-structured data. In this paper, a novel scoring function is proposed, w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PVLDB
دوره 11 شماره
صفحات -
تاریخ انتشار 2017